The objective of this notebook is to use the Lorenz attractor to determine the dynamics of the system and to gain insights into the behavior and characteristics of a chaotic system.
In this section we aim to investigate how the Lorenz system’s state transitions from one state to another at each time stamp. By studying these state transitions, we seek to identify patterns, trends, and potential underlying dynamics within the system. This analysis can provide valuable insights into the behavior and evolution of the system, enabling us to understand its characteristics and potentially make predictions or interpretations based on the observed patterns.
We follow the following steps to perform this experiment:
We have selected the Lorenz attractor dataset to
explore the complex behaviour of the lorenz attractor. We have imported
the dataset from the csv.
We have divided 80% of the data (containing 160,000) as training dataset and rest 20% as Validation dataset (containing 40,000)
Next, we tried to visualize the Lorenz attractor (overlapping spirals) in 3D Space.
In order to visualize the Lorenz attractor in 2D Space, we used HVT algorithm to compress the data.We started with a certain number of cells and gradually increases it until the desired compression percentage of 80% was achieved and we are clearly able to visualize the overlapping spirals in 2D Space.
So, Finally we had our model ready. we tried to predict using our validation dataset (200,000 data points to avoiud unexplores Cell IDs) as to which cell and which level each point belongs to.
The Lorenz attractor is a three-dimensional figure that is generated by a set of differential equations that model a simple chaotic dynamic system of convective flow. Lorenz Attractor arises from a simplified set of equations that describe the behavior of a system involving three variables. These variables represent the state of the system at any given time and are typically denoted by (x, y, z). The equations are as follows:
\[ dx/dt = σ*(y-x) \] \[ dy/dt = x*(r -z)-y \] \[ dz/dt = x*y-β*z \] where dx/dt, dy/dt, and dz/dt represent the rates of change of x, y, and z respectively over time (t). σ, r, and β are constant parameters of the system, with σ(σ = 10) controlling the rate of convection, r(r=28) controlling the difference in temperature between the convective and stable regions, and β(β = 8/3) representing the ratio of the width to the height of the convective layer. When these equations are plotted in three-dimensional space, they produce a chaotic trajectory that never repeats. The Lorenz attractor exhibits sensitive dependence on initial conditions, meaning even small differences in the starting conditions can lead to drastically different trajectories over time. This sensitivity to initial conditions is a defining characteristic of chaotic systems.
In this section, we will use the
Lorenz Attractor Dataset. This dataset contains 200
thousand observations and 5 columns.The dataset can be downloaded from
here
The dataset includes the following columns:
Here, we load the data. Let’s explore the Lorenz Attractor Dataset. For the sake of brevity we are displaying first ten rows.
dataset <- read.csv("./sample_dataset/lorenze_attractor.csv")
dataset <- dataset %>% dplyr::select(X,Y,Z,U,t)
dataset$t <- round(dataset$t, 5)
DT::datatable(head(dataset,10),options = list(pageLength = 10, scrollX = TRUE), class = 'cell-border stripe')Let’s have a look at the Training dataset containing 160,000 data points. For the sake of brevity we are displaying first 10 rows.
noOfPoints <- dim(dataset)[1]
trainLength <- as.integer(noOfPoints * 0.8)
trainDataset <- dataset[1:trainLength,]
trainData <- trainDataset %>% select(X,Y,Z)
DT::datatable(head(trainDataset,10),options = list(pageLength = 10, scrollX = TRUE), class = 'cell-border stripe')Now, let us analyse the summary of training dataset.
Let’s have a look at the Test dataset containing 40,000 data points. For the sake of brevity we are displaying first 10 rows.
testDataset <- dataset[(trainLength+1):noOfPoints,]
DT::datatable(head(testDataset, 10),options = list(pageLength = 10, scrollX = TRUE), class = 'cell-border stripe')Now let us analyse the Summary of test dataset.
When the Lorenz attractor is visualized in a three-dimensional space, it forms a complex and intricate structure. It consists of a set of looping and spiraling curves that are confined within a specific region. The attractor has a butterfly-like shape, with two large wings and a narrow body connecting them.
Now let’s try to visualize the Lorenz attractor (overlapping spirals) in 3D Space.
data_3d <- dataset[sample(1:nrow(dataset), 1000), ]
plot_3d <- plotly::plot_ly(data_3d, x= ~X, y= ~Y, z = ~Z) %>% add_markers( marker = list(
size = 2,
symbol = "circle",
color = ~Z,
colorscale = "Bluered",
colorbar = (list(title = 'z_var'))))
plot_3dFigure 1: Lorenz attractor in 3D space
We will use the HVT function to compress our data while
preserving essential features of the dataset. Our goal is to achieve
data compression upto atleast 80%. In situations where the
compression ratio does not meet the desired target, we can explore
adjusting the model parameters as a potential solution. This involves
making modifications to parameters such as the
quantization error threshold or
increasing the number of cells and then rerunning the HVT
function again.
We will pass the below mentioned model parameters along with torus
dataset to HVT function and see if the desired compression
percentage is achieved.
Model Parameters
Let’s have a look at the Train dataset containing 160,000 data points. For the sake of brevity we are displaying first 10 rows. Here, we are not including the U and t column from the entire dataset, so that compression takes place only for the X, Y, Z coordinates and not for U(velocity) and t(Timestamp). After training, we merge back the U and t column with the dataset for prediction
DT::datatable(head(trainData, 10),options = list(pageLength = 10, scrollX = TRUE), class = 'cell-border stripe')Now, let us analyse the structure of training dataset.
str(trainData)
#> 'data.frame': 160000 obs. of 3 variables:
#> $ X: num 0 0.0025 0.00499 0.00747 0.00995 ...
#> $ Y: num 1 1 1 0.999 0.999 ...
#> $ Z: num 20 20 20 20 19.9 ...set.seed(240)
hvt.results <- HVT::HVT(
trainData,
n_cells = 100,
depth = 1,
quant.err = 0.1,
projection.scale = 10,
normalize = T,
distance_metric = "L1_Norm",
error_metric = "max",
quant_method = "kmeans"
)Let’s checkout the compression summary .
| segmentLevel | noOfCells | noOfCellsBelowQuantizationError | percentOfCellsBelowQuantizationErrorThreshold | parameters |
|---|---|---|---|---|
| 1 | 100 | 0 | 0 | n_cells: 100 quant.err: 0.1 distance_metric: L1_Norm error_metric: max quant_method: kmeans |
For better visualisation, let’s plot the Voronoi tessellation for 100 cells.
Figure 2: The Voronoi tessellation for layer 1 shown for the 100 cells in the dataset ’lorenz attractor’
Let’s have a look at the dataset we use for prediction which contains 200,000 data points. For the sake of brevity we are displaying first 10 rows.
DT::datatable(head(dataset, 10),options = list(pageLength = 10, scrollX = TRUE), class = 'cell-border stripe')Now, let us analyse the structure of dataset we use for prediction.
str(dataset)
#> 'data.frame': 200000 obs. of 5 variables:
#> $ X: num 0 0.0025 0.00499 0.00747 0.00995 ...
#> $ Y: num 1 1 1 0.999 0.999 ...
#> $ Z: num 20 20 20 20 19.9 ...
#> $ U: num 0 0.0005 0.001 0.0015 0.002 ...
#> $ t: num 0 0.00025 0.0005 0.00075 0.001 0.00125 0.0015 0.00175 0.002 0.00225 ...Now once we have built the model, let us try to predict using our validation dataset which cell and which level each point belongs to.
set.seed(240)
predictions <- HVT::predictHVT(
dataset,
hvt.results,
child.level = 1,
line.width = c(1.2),
color.vec = c("#141B41"),
quant.error.hmap = 0.1,
n_cells.hmap = 100
)The Flow Map functions mentioned in the next section requires Cell ID from prediction output and sorted Tiemstamp from the dataset we used for prediction. So we merge them both to get a modified data frame that pairs cell IDs with their respective timestamps.
Let’s see which cell and level each point belongs to with the sorted Tiemstamp. For the sake of brevity, we will only show the first 10 rows
scored_data <- predictions[["scoredPredictedData"]] %>%
round(2) %>% cbind(dataset) %>%
as.data.frame()
colnames(scored_data) <- c("Segment.Level", "Segment.Parent", "Segment.Child", "n", "Cell.ID", "Quant.Error", "pred_X", "pred_Y", "pred_Z", "centroidRadius", "diff", "anomalyFlag", "X", "Y", "Z", "U", "t")
DT::datatable(head(scored_data, 10),options = list(pageLength = 10, scrollX = TRUE), class = 'cell-border stripe')
# **Description - It serves as a tool for exploring and understanding temporal patterns and transitions in the data**
#
# This state_transition_plot function is designed to visualize and analyze sequential data representing state transitions. It takes as input a dataset with state information over time and generates different types of plots based on user preferences. Users can choose to create a timeseries plot of state transitions or a timeseries with lines connecting the state transitions. Additionally, the function allows for data sampling to focus on specific time periods.
#
# **Usage**
#
# > state_transition_plot(df, sample_size = 0.2, line_plot = FALSE, cellid_column = "Cell.ID", time_column = "t")
#
# **Arguments**
#
# * @param **df** (dataframe) - A dataframe with prediction output and along with the dataset we used for predictHVT function
# * @param **sample_size** (numeric) - Need to specify the sampling value which ranges between 0.1 to 1. The highest value 1, outputs a plot with the entire dataset. Sampling of data takes place from the last
# * @param **line_plot** (logical) - If TRUE, the output will be a timeseries plot with a line connecting the states according to the sample_size otherwise, a timeseries plot but without a line based on the sample_size will be the output
# * @param **cellid_column** (character) - Specify the column name of Cell ID from the dataframe you pass to this function
# * @param **time_column** (character) - Specify the column name of Cell ID from the dataframe you pass to this function
state_time_plot_result <- state_transition_plot(df = scored_data, cellid_column = "Cell.ID", time_column = "t")
state_time_plot_resultThis function displays probability with Tplus1 states for all every cell ID in the form of table. For the sake of brevity we are displaying the probability table for the Cell ID 1
# **Description - It is useful for analyzing and visualizing state transition patterns in a dataset**
#
# The get_transition_probability_table function calculates transition probabilities for distinct states within a specified column of a dataframe (df). It computes the likelihood of transitioning from one state to another in sequential rows and presents the results as data frames in a list. Each data frame contains information about the next state (Tplus1_States), the frequency of this transition (Frequency), and the calculated transition probability (Probability). Additionally, the function displays these probability tables for each unique state and stores them in a global variable named trans_prob_df
#
# **Usage**
#
# > get_transition_probability_table(df, cellid_column = "Cell.ID", time_column = "t")
#
# **Arguments**
#
# * @param **df** (dataframe) - A dataframe with prediction output and along with the dataset we used for predictHVT function
# * @param **cellid_column** (character) - Specify the column name of Cell ID from the dataframe you pass to this function
# * @param **time_column** (character) - Specify the column name of Cell ID from the dataframe you pass to this function
get_transition_probability_table(df = scored_data, cellid_column = "Cell.ID", time_column = "t")Probability table for Cell ID 1 :
Probability table for Cell ID 2 :
Probability table for Cell ID 3 :
Probability table for Cell ID 4 :
Probability table for Cell ID 5 :
Probability table for Cell ID 6 :
Probability table for Cell ID 7 :
Probability table for Cell ID 8 :
Probability table for Cell ID 9 :
Probability table for Cell ID 10 :
Probability table for Cell ID 11 :
Probability table for Cell ID 12 :
Probability table for Cell ID 13 :
Probability table for Cell ID 14 :
Probability table for Cell ID 15 :
Probability table for Cell ID 16 :
Probability table for Cell ID 17 :
Probability table for Cell ID 18 :
Probability table for Cell ID 19 :
Probability table for Cell ID 20 :
Probability table for Cell ID 21 :
Probability table for Cell ID 22 :
Probability table for Cell ID 23 :
Probability table for Cell ID 24 :
Probability table for Cell ID 25 :
Probability table for Cell ID 26 :
Probability table for Cell ID 27 :
Probability table for Cell ID 28 :
Probability table for Cell ID 29 :
Probability table for Cell ID 30 :
Probability table for Cell ID 31 :
Probability table for Cell ID 32 :
Probability table for Cell ID 33 :
Probability table for Cell ID 34 :
Probability table for Cell ID 35 :
Probability table for Cell ID 36 :
Probability table for Cell ID 37 :
Probability table for Cell ID 38 :
Probability table for Cell ID 39 :
Probability table for Cell ID 40 :
Probability table for Cell ID 41 :
Probability table for Cell ID 42 :
Probability table for Cell ID 43 :
Probability table for Cell ID 44 :
Probability table for Cell ID 45 :
Probability table for Cell ID 46 :
Probability table for Cell ID 47 :
Probability table for Cell ID 48 :
Probability table for Cell ID 49 :
Probability table for Cell ID 50 :
Probability table for Cell ID 51 :
Probability table for Cell ID 52 :
Probability table for Cell ID 53 :
Probability table for Cell ID 54 :
Probability table for Cell ID 55 :
Probability table for Cell ID 56 :
Probability table for Cell ID 57 :
Probability table for Cell ID 58 :
Probability table for Cell ID 59 :
Probability table for Cell ID 60 :
Probability table for Cell ID 61 :
Probability table for Cell ID 62 :
Probability table for Cell ID 63 :
Probability table for Cell ID 64 :
Probability table for Cell ID 65 :
Probability table for Cell ID 66 :
Probability table for Cell ID 67 :
Probability table for Cell ID 68 :
Probability table for Cell ID 69 :
Probability table for Cell ID 70 :
Probability table for Cell ID 71 :
Probability table for Cell ID 72 :
Probability table for Cell ID 73 :
Probability table for Cell ID 74 :
Probability table for Cell ID 75 :
Probability table for Cell ID 76 :
Probability table for Cell ID 77 :
Probability table for Cell ID 78 :
Probability table for Cell ID 79 :
Probability table for Cell ID 80 :
Probability table for Cell ID 81 :
Probability table for Cell ID 82 :
Probability table for Cell ID 83 :
Probability table for Cell ID 84 :
Probability table for Cell ID 85 :
Probability table for Cell ID 86 :
Probability table for Cell ID 87 :
Probability table for Cell ID 88 :
Probability table for Cell ID 89 :
Probability table for Cell ID 90 :
Probability table for Cell ID 91 :
Probability table for Cell ID 92 :
Probability table for Cell ID 93 :
Probability table for Cell ID 94 :
Probability table for Cell ID 95 :
Probability table for Cell ID 96 :
Probability table for Cell ID 97 :
Probability table for Cell ID 98 :
Probability table for Cell ID 99 :
Probability table for Cell ID 100 :
# **Description - It is used to generate and visualize transition probability matrices for state data**
#
# The reconcile_transition_probability function computes and visualizes transition probabilities for state data, it calculates transition probabilities between consecutive states in the input dataset, both with and without self-transitions. The function generates heatmap visualizations for these transition probabilities, providing insights into state transitions over time. It performs Markov Chain analysis on the data, producing transition matrices with and without self-transitions, along with corresponding heatmaps.
#
# **Usage**
#
# > reconcile_transition_probability(df, hmap_type = "All", cellid_column = "Cell.ID", time_column = "Timestamp")
#
# **Arguments**
#
# * @param **df** (dataframe) - A dataframe with prediction output and along with the dataset we used for predictHVT function
# * @param **hmap_type** (character) - If set to without_self_state, reconciliation plots for manual and Markovchain for highest transition probability excluding the self-state is given as output, if set to with_self_state, reconciliation plots for manual and Markovchain for highest transition probability considering the self-state is given as output and if set to All, plots including and excluding self-state is given as output
# * @param **cellid_column** (character) - Specify the column name of Cell ID from the dataframe you pass to this function
# * @param **time_column** (character) - Specify the column name of Cell ID from the dataframe you pass to this function
reconcile_plots <- reconcile_transition_probability(df = scored_data, hmap_type = "All", cellid_column = "Cell.ID", time_column = "t")The darker diagonal cells indicate higher probabilities of staying in the same state. These transitions represent situations where there is no change from the current state to the next state. Such states might be attractors in a dynamic system, where the system naturally tends to return to these states even after minor perturbations.
In this plot, the transitions suggest that the states tend to move to neighboring states more frequently. Proximity might not only refer to physical distance but also to similarities in attributes or conditions.
This heatmap uses the same data from the manual reconciliation process to determine the probability using self-state using the markovchainFit function.
This heatmap uses the same data from the manual reconciliation process to determine the probability using self-state using the markovchainFit function.
# **Description - It is designed for creating and visualizing flow maps based on input data**
#
# The generate_flow_maps function in R extracts centroid coordinates and probability data from input. It generates two types of flow maps, one based on the second-highest probability and another on the highest probability, using arrows to represent state transitions. Additionally, it offers optional animations to visualize transitions over time, either sorted by timestamps or based on the next state. Users can customize the type of maps and animations they want to create for exploring state transitions in their data.
#
# **Usage**
#
# > generate_flow_maps(hvt_model_output, transition_probability_df, hvt_plot_output, df, animation = "All", flow_map = "All", animation_speed = 2, threshold = 0.6, cellid_column = "Cell.ID", time_column = "t")
#
#
# **Arguments**
#
# * @param **hvt_model_output** (list) - It is an output list in hierarchy from hvt model training. To get the centroid coordinates, we retrieve the second element, then within the second element, obtain the 1st element's `1`. And to get the Cell IDs, we retrieve the third element, then within the third element, obtain the Cell.ID from summary
# * @param **transition_probability_df** (dataframe) - A list of dataframes which is the output from the get_transition_probability_table function
# * @param **hvt_plot_output** (list) - Base plot for the flow maps
# * @param **df** (dataframe) - A dataframe with prediction output and along with the dataset we used for predictHVT function
# * @param **animation** (character) - If set to time_based, dot animation for state transition with sorted Timestamp is the output. If set to state_based, arrow animation based on highest state excluding self-state will be the output. If set to All, both the animation will be resulted
# * @param **flow_map** (character) - If set to self_state, dot flowmap for next state based on highest transition probability will be the output. If set to without_self_state, arrow flowmap with arrow-size based on the distance between the two states pointing to next state based on highest transition probability excluding self-state probability will be the output. If set to probability, arrow flowmap with arrow-size based on their probability pointing to next state based on highest transition probability excluding self-state will be the output. If set to All, all three flowmaps will be resulted
# * @param **animation_speed** (numeric) - Must be numeric value and a factor of 100
# * @param **threshold** (numeric) - It ranges between 0.1 to 1. This numeric variable is used to control the categorization of probability values into "High Probability" and "Low Probability" for the flow map type "Probability"
# * @param **cellid_column** (character) - Specify the column name of Cell ID from the dataframe you pass to this function
# * @param **time_column** (character) - Specify the column name of Cell ID from the dataframe you pass to this function
source("../R/flowmap.R")
plots <- generate_flow_maps(hvt_model_output = hvt.results, transition_probability_df = trans_prob_df, hvt_plot_output = hvt.plot, df = scored_data, animation = "All", flow_map = "All", animation_speed = 2, threshold = 0.7, cellid_column = "Cell.ID", time_column = "t")Arrow lengths on the below Flow map is based on the distance between current and next state. The metric used to calculate the distance is Euclidean Distance
Circle around the centroid represents self-state Probability
Arrow segment length based on Probability